Add configurable MoE runtime flags to MegatronConfig#1213
Add configurable MoE runtime flags to MegatronConfig#1213tyler-griggs wants to merge 2 commits intomainfrom
Conversation
Expose 5 MoE runtime config flags as first-class MegatronConfig fields: - moe_token_dispatcher_type (replaces hardcoded "alltoall") - moe_router_load_balancing_type (replaces hardcoded "none") - moe_grouped_gemm (enables fused grouped GEMM for MoE) - moe_router_score_function (e.g. "sigmoid" for GLM/DeepSeek-V3) - moe_router_enable_expert_bias (learned bias for load balancing) Most architecture flags (num_experts, topk, qkv_bias, rotary, etc.) are auto-detected by AutoBridge from the HF config.json. These 5 flags control runtime behavior that is NOT in the HF config. Advanced/model-specific flags can still be passed through the existing transformer_config_kwargs dict, which is applied after these fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request correctly exposes several Mixture-of-Experts (MoE) runtime flags in MegatronConfig, replacing previous hardcoded values. This improves configurability. The changes are well-implemented across the configuration files, worker logic, and are accompanied by new tests.
My review includes a couple of suggestions for improvement:
- In
megatron_worker.py, I've suggested a small refactoring to make the application of optional MoE configurations more concise and maintainable. - In the new test file
test_moe_config.py, I've pointed out a test case that could be made more exhaustive by adding assertions for all the fields being tested.
Overall, this is a solid contribution that enhances the flexibility of MoE configurations.
| if megatron_config.moe_router_score_function is not None: | ||
| provider.moe_router_score_function = megatron_config.moe_router_score_function | ||
| if megatron_config.moe_router_enable_expert_bias is not None: | ||
| provider.moe_router_enable_expert_bias = megatron_config.moe_router_enable_expert_bias |
There was a problem hiding this comment.
To improve maintainability and reduce repetition, you can loop over the optional MoE configuration fields. This makes it easier to add more optional fields in the future.
| if megatron_config.moe_router_score_function is not None: | |
| provider.moe_router_score_function = megatron_config.moe_router_score_function | |
| if megatron_config.moe_router_enable_expert_bias is not None: | |
| provider.moe_router_enable_expert_bias = megatron_config.moe_router_enable_expert_bias | |
| for field_name in ("moe_router_score_function", "moe_router_enable_expert_bias"): | |
| value = getattr(megatron_config, field_name) | |
| if value is not None: | |
| setattr(provider, field_name, value) |
| assert cfg.moe_grouped_gemm is True | ||
| assert cfg.moe_router_score_function == "sigmoid" | ||
| assert cfg.moe_router_enable_expert_bias is True |
There was a problem hiding this comment.
This test is not exhaustive. It's good practice to assert all fields that are being set from the dictionary to ensure the build_nested_dataclass function works as expected for all new fields. The assertions for moe_token_dispatcher_type and moe_router_load_balancing_type are missing.
| assert cfg.moe_grouped_gemm is True | |
| assert cfg.moe_router_score_function == "sigmoid" | |
| assert cfg.moe_router_enable_expert_bias is True | |
| assert cfg.moe_token_dispatcher_type == "alltoall" | |
| assert cfg.moe_router_load_balancing_type == "none" | |
| assert cfg.moe_grouped_gemm is True | |
| assert cfg.moe_router_score_function == "sigmoid" | |
| assert cfg.moe_router_enable_expert_bias is True |
…oad_balancing_type in test_moe_config_from_dict Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Exposes 5 MoE runtime config flags as first-class fields on the
MegatronConfigdataclass, replacing 2 hardcoded values in the Megatron worker.New fields
moe_token_dispatcher_typestr"alltoall"moe_router_load_balancing_typestr"none"moe_grouped_gemmboolFalsemoe_router_score_functionOptional[str]None"sigmoid"for GLM/DeepSeek-V3)moe_router_enable_expert_biasOptional[bool]Noneconfig.json— no explicit config needed